Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy
نویسندگان
چکیده
Fine-grained data provenance ensures reproducibility of results in decision making, process control and e-science applications. However, maintaining this provenance is challenging in stream data processing because of its massive storage consumption, especially with large overlapping sliding windows. In this paper, we propose an approach to infer fine-grained data provenance by using a temporal data model and coarse-grained data provenance of the processing. The approach has been evaluated on a real dataset and the result shows that our proposed inferring method provides provenance information as accurate as explicit fine-grained provenance at reduced storage consumption.
منابع مشابه
Probabilistic Inference of Fine-Grained Data Provenance
Decision making, process control and e-science applications process stream data, mostly produced by sensors. To control and monitor these applications, reproducibility of result is a vital requirement. However, it requires massive amount of storage space to store fine-grained provenance data especially for those transformations with overlapping sliding windows. In this paper, we propose a proba...
متن کاملFine-Grained Provenance Inference for a Large Processing Chain with Non-materialized Intermediate Views
Many applications facilitate a data processing chain, i.e. a workflow, to process data. Results of intermediate processing steps may not be persistent since reproducing these results are not costly and these are hardly re-usable. However, in stream data processing where data arrives continuously, documenting fine-grained provenance explicitly for a processing chain to reproduce results is not a...
متن کاملA Efficient Stream Provenance via Operator Instrumentation
Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS), not only to address complex applications that require diagnostic capabilities and assurance, but also for providing advanced functionality such as revision processing or query debugging. This paper introduces a novel approach that uses operator instrumentation, i.e., modifying the behavior of o...
متن کاملThe Case for Fine-Grained Stream Provenance
The current state of the art for provenance in data stream management systems (DSMS) is to provide provenance at a high level of abstraction (such as, from which sensors in a sensor network an aggregated value is derived from). This limitation was imposed by high-throughput requirements and an anticipated lack of application demand for more detailed provenance information. In this work, we firs...
متن کاملA Study on Data Repertory Acumen Schema to Manage Data Provenance in Geoscience Application
Data provenance accepts and approves the scientists to model as to investigate the beginning of an unexpected value. It can be used as a duplicate recipe for output data products. The capturing provenance requires enormous effort by scientists in terms of time, training and need to design the workflow of the scientific model i.e., workflow source, which requires both time and training. Scientis...
متن کامل